Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: pt Date: Jan 26, 2005 7:21am
Forum: toronto Subject: Re: scanner credit?

|Hi Molly, I really appreciate your effort to |answer all questions. But I don't know your |background. Are you a librarian? Otherwise, you |should use the librarians in Toronto for |cataloging expertise. They should know all about |things like editions, LoC, LCCN, OCLC, etc.

Hi, I'm Parker, an Archivist at the IA and I spent a six years in Library/Information schools (U.C. Berkeley and U. of Washington) in persuit of my librarian cred (resume available upon request). I've been working with Molly (the project manager) and librarians at the U. of Toronto on metadata for the Canadian Libraries collection.

We went with the UofT catalogue records because we could be sure that there would be a minimal number of problems matching records, and they provided us great APIs for interacting with the catalogue to get the kind of data we wanted/needed for our purposes, specifically Dublin Core and MARC XML records.

I agree it would be great to have other kinds of records as well associated with the books, so that anyone with these records could get at the content. However, we're a small organization and have to make choices about where we spend our resources. If you or someone you know is interested in helping us associate other types of records with these books please have them contact us ( This would be great.

|There is even some new thinking (FRBR) going on |in the cataloging world, with people like Roy |Tennant at the University of California main |office in Oakland, which is just across the Bay |from the Internet Archive at the Presidio. I'm a |C/UNIX programmer but when I hang around on |librarian mailing lists I get comments such as |"Brewster Kahle is not a librarian!!!" and that |seems so unnecessary, when the IA could easily |cover its reputation with a little cooperation |and openness.

Please remind these people that contrary to what the ALA might say, they do not own the word "librarian". Brewster founded and runs a (digital) library, where we archive and distribute books/music/video/software/data. In my mind it is that that makes him a librarian.

We're happy to work with librarians who have expertise, and freely admit we could use their advice and expertise. We work with librarians every day on various projects, but we're always glad to have more advice and help (please!). Again, we're small organization and there are limits to what we can do without help.

I'd be happy to hear suggestions about ways in which we could be more open to the library community. We don't have any secrets that I know of :). Next time somone says we are not open, please ask them to contact me ( ask for Parker).

|Also, Michael Hart has started to call for |librarians to cooperate with him to produce MARC |records for all Project Gutenberg e-texts, and |this should perhaps be concerted with the Million |Book and Canadian Libraries projects at the |Internet Archive.

Where possible we fetch MARC XML records from the UofT catalogue. These are included with the scanned books and are available much more frequently for recently scanned books than for the early scans. To see them try:

or click the 'All files' link on the side and look for it there. For books without MARC XML records I believe we'll be working with the folks at PG to figure out the easiest way to make our content work for them and visa versa.

|By the way, when I receive reminders about this |discussion, the e-mail contains the URL | |with the double slash, that this website refuses |to handle.

Thanks for the report, I'll see what I can do about fixing this error.


Reply to this post
Reply [edit]

Poster: molly Date: Jan 27, 2005 12:49am
Forum: toronto Subject: Re: scanner credit?

Actually, I'm not a librarian, I have an art degree. We are working very closely with the librarians in Toronto though, and one tends to pick up various acronyms for standards and metadata by hanging out with them. *grin*

We have everything to gain from being open and working with librarians, and we feel we are doing a pretty good job for an organization with 20 people in it. We pride ourselves on being open, being brutally honest, and developing and using open source software, so it's weird to hear that we have the opposite reputaion. We depend on the outside community for help though, so if you would like to help make the book collection better by assigning LoC catalogue numbers to everything, we would love your expertise! I'm molly at archive dot org.


Reply to this post
Reply [edit]

Poster: aronsson Date: Jan 27, 2005 5:16pm
Forum: toronto Subject: Re: scanner credit?

I'm looking forward to our cooperation. As for openness: In August 2004 I wrote a message on the Million Books discussion (Subject: German texts), and it took until December before someone (you, actually) answered. In those three months I didn't know if the project was dead, if you were busy doing other things, or if nobody wanted to communicate with me. According to other messages in that forum, other people had the same impression. I'm glad that this has changed, and hope you and Parker will continue to be around. You are doing many exciting things here, but I also see a lot of room for improvement. I have a lot of experience in receiving (harsh and undue) criticism, so I'll try my best to be constructive.

I'm running Project Runeberg ( with Scandinavian e-texts since 1992 and digital facsimiles since 1998 with just a handful of unpaid core volunteers, so 20 people is a huge organization to me. You are free to reuse our contents, which is in the public domain. We don't claim or reserve any rights to the digitization. (However, we don't want the name "Project Runeberg" to be reused by others.) At the very bottom of the title page for each of our books there is a "download" link that helps you retrieve a ZIP archive of 600dpi TIFF G4 images. Today this ZIP archive doesn't contain any OCR text or metadata, but we should improve this in the future.

Places that I hang around include the Web4lib, Bookpeople, and Diglib mailing lists, and the D-Lib Magazine website.